Practical Entropy-Compressed Rank/Select Dictionary

نویسندگان

  • Daisuke Okanohara
  • Kunihiko Sadakane
چکیده

Rank/Select dictionaries are data structures for an ordered set S f0; 1; : : : ; n 1g to compute rank(x; S) (the number of elements in S which are no greater than x), and select(i; S) (the i-th smallest element in S), which are the fundamental components of succinct data structures of strings, trees, graphs, etc. In those data structures, however, only asymptotic behavior has been considered and their performance for real data is not satisfactory. In this paper, we propose novel four Rank/Select dictionaries, esp, recrank, vcode and sdarray, each of which is small if the number of elements in S is small, and indeed close to nH0(S) (H0(S) 1 is the zero-th order empirical entropy of S) in practice, and its query time is superior to the previous ones. Experimental results reveal the characteristics of our data structures and also show that these data structures are superior to existing implementations in both size and query time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Compressed-Gap Data-Aware Measure for Indexable Dictionaries

We consider the problem of building a compressed fully-indexable dictionary over a set S of n items out of a universe U = {0, ..., u − 1}. We use gap-encoding combined with entropy compression in order to reduce the space of our structures. Let H 0 be the zero-order empirical entropy of the gap stream. We observe that nH 0 ∈ o(gap) if the gaps are highly compressible, and prove that nH 0 ≤ n lo...

متن کامل

High-Order Entropy Compressed Bit Vectors with Rank/Select

We design practical implementations of data structures for compressing bit-vectors to support efficient rank-queries (counting the number of ones up to a given point). Unlike previous approaches, which either store the bit vectors plainly, or focus on compressing bit-vectors with low densities of ones or zeros, we aim at low entropies of higher order, for example 101010 . . . 10. Our implementa...

متن کامل

Practical Rank/Select Queries over Arbitrary Sequences

We present a practical study on the compact representation of sequences supporting rank, select, and access queries. While there are several theoretical solutions to the problem, only a few have been tried out, and there is little idea on how the others would perform, especially in the case of sequences with very large alphabets. We first present a new practical implementation of the compressed...

متن کامل

Rank and select: Another lesson learned

Rank and select queries on bitmaps are essential building bricks of many compressed data structures, including text indexes, membership and range supporting spatial data structures, compressed graphs, and more. Theoretically considered yet in 1980s, these primitives have also been a subject of vivid research concerning their practical incarnations in the last decade. We present a few novel rank...

متن کامل

Alphabet Partitioning for Compressed Rank/Select and Applications

We present a data structure that stores a string s[1..n] over the alphabet [1..σ] in nH0(s) + o(n)(H0(s)+1) bits, where H0(s) is the zero-order entropy of s. This data structure supports the queries access and rank in time O (lg lg σ), and the select query in constant time. This result improves on previously known data structures using nH0(s) + o(n lg σ) bits, where on highly compressible insta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/cs/0610001  شماره 

صفحات  -

تاریخ انتشار 2007